301 research outputs found
Reverberation time estimation on the ACE corpus using the SDD method
Reverberation Time (T60) is an important measure for characterizing the
properties of a room. The author's T60 estimation algorithm was previously
tested on simulated data where the noise is artificially added to the speech
after convolution with a impulse responses simulated using the image method. We
test the algorithm on speech convolved with real recorded impulse responses and
noise from the same rooms from the Acoustic Characterization of Environments
(ACE) corpus and achieve results comparable results to those using simulated
data.Comment: In Proceedings of the ACE Challenge Workshop - a satellite event of
IEEE-WASPAA 2015 (arXiv:1510.00383
Source Coding in Networks with Covariance Distortion Constraints
We consider a source coding problem with a network scenario in mind, and
formulate it as a remote vector Gaussian Wyner-Ziv problem under covariance
matrix distortions. We define a notion of minimum for two positive-definite
matrices based on which we derive an explicit formula for the rate-distortion
function (RDF). We then study the special cases and applications of this
result. We show that two well-studied source coding problems, i.e. remote
vector Gaussian Wyner-Ziv problems with mean-squared error and mutual
information constraints are in fact special cases of our results. Finally, we
apply our results to a joint source coding and denoising problem. We consider a
network with a centralized topology and a given weighted sum-rate constraint,
where the received signals at the center are to be fused to maximize the output
SNR while enforcing no linear distortion. We show that one can design the
distortion matrices at the nodes in order to maximize the output SNR at the
fusion center. We thereby bridge between denoising and source coding within
this setup
Binaural Speech Enhancement Using STOI-Optimal Masks
STOI-optimal masking has been previously proposed and developed for
single-channel speech enhancement. In this paper, we consider the extension to
the task of binaural speech enhancement in which spatial information is known
to be important to speech understanding and therefore should be preserved by
the enhancement processing. Masks are estimated for each of the binaural
channels individually and a `better-ear listening' mask is computed by choosing
the maximum of the two masks. The estimated mask is used to supply probability
information about the speech presence in each time-frequency bin to an
Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using
the proposed method for binaural signals with a directional noise not only
improves the SNR of the noisy signal but also preserves the binaural cues and
intelligibility.Comment: Accepted at IWAENC 202
Graph neural networks for sound source localization on distributed microphone networks
Distributed Microphone Arrays (DMAs) present many challenges with respect to
centralized microphone arrays. An important requirement of applications on
these arrays is handling a variable number of input channels. We consider the
use of Graph Neural Networks (GNNs) as a solution to this challenge. We present
a localization method using the Relation Network GNN, which we show shares many
similarities to classical signal processing algorithms for Sound Source
Localization (SSL). We apply our method for the task of SSL and validate it
experimentally using an unseen number of microphones. We test different feature
extractors and show that our approach significantly outperforms classical
baselines.Comment: Presented as a poster at ICASSP 202
End-to-End Classification of Reverberant Rooms using DNNs
Reverberation is present in our workplaces, our homes and even in places
designed as auditoria, such as concert halls and theatres. This work
investigates how deep learning can use the effect of reverberation on speech to
classify a recording in terms of the room in which it was recorded in.
Approaches previously taken in the literature for the task relied on handpicked
acoustic parameters as features used by classifiers. Estimating the values of
these parameters from reverberant speech involves estimation errors, inevitably
impacting the classification accuracy. This paper shows how DNNs can perform
the classification in an end-to-end fashion, therefore by operating directly on
reverberant speech. Based on the above, a method for the training of
generalisable DNN classifiers and a DNN architecture for the task are proposed.
A study is also made on the relationship between feature-maps derived by DNNs
and acoustic parameters that describe known properties of reverberation. In the
experiments shown, AIRs are used that were measured in 7 real rooms. The
classification accuracy of DNNs is compared between the case of having access
to the AIRs and the case of having access only to the reverberant speech
recorded in the same rooms. The experiments show that with access to the AIRs a
DNN achieves an accuracy of 99.1% and with access only to reverberant speech,
the proposed DNN achieves an accuracy of 86.9%. The experiments replicate the
testing procedure used in previous work, which relied on handpicked acoustic
parameters, allowing the direct evaluation of the benefit of using deep
learning.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language
Processin
Dual input neural networks for positional sound source localization
In many signal processing applications, metadata may be advantageously used
in conjunction with a high dimensional signal to produce a desired output. In
the case of classical Sound Source Localization (SSL) algorithms, information
from a high dimensional, multichannel audio signals received by many
distributed microphones is combined with information describing acoustic
properties of the scene, such as the microphones' coordinates in space, to
estimate the position of a sound source. We introduce Dual Input Neural
Networks (DI-NNs) as a simple and effective way to model these two data types
in a neural network. We train and evaluate our proposed DI-NN on scenarios of
varying difficulty and realism and compare it against an alternative
architecture, a classical Least-Squares (LS) method as well as a classical
Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN
significantly outperforms the baselines, achieving a five times lower
localization error than the LS method and two times lower than the CRNN in a
test dataset of real recordings
Adaptive inverse filtering of room acoustics
Equalization techniques for high order, multichannel, FIR systems are important for dereverberation of speech observed in reverberation using multiple microphones. In this case the multichannel system represents the room impulse responses (RIRs). The existence of near-common zeros in multichannel RIRs can slow down the convergence rate of adaptive inverse filtering algorithms. In this paper, the effect of common and near-common zeros on both the closed-form and the adaptive inverse filtering algorithms is studied. An adaptive shortening algorithm of room acoustics is presented based on this study. 1
- …